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76. (Currently amended) A method for identifying nucleotides for variation in 
nucleic acids encoding a protein variant library in order to affect a desired activity, said method 
comprising: 



(a) receiving data characterizing a training set of a protein variant library, wherein the 
data provid e s comprises activity and a nucleotide sequence information for each protein 



variant in the training set; 

(b) from the data, developing a computational algori thmic sequence activity model 
that predicts activity as a function of nucleptide types and corresponding position in the 
nucleotide sequence; 

(c) using the sequence activity model to rank positions in a reference nucleotide 
sequence and/or nucleotide types at specific positions in the reference nucleotide 
sequence in order of impact on the desired activity; 

(d) using the ranking to identify one or more nucleotides, in the reference nucleotide 
sequence, that are to be varied or fixed in order to impact the desired activity ; and 

(e) generating one or more of the protein variants encoded bv the reference 
nucleotide sequence in which the identified nucleotides are varied or fixed in order to i 
impact the desired activity . : 

77. (Previously presented) The method of claim 76, wherein the nucleotides to be 

varied are codons encoding particular amino acids. ! 

78. (Previously presented) The method of claim 77, wherein the activity is a function 
of expression of nucleic acids. 

79. (Currently amended) A computer program product comprising a machine 
readable medium on which is provided program instructions for identifying nucleotides for 
variation in nucleic acids encoding a protein variant library in order to affect a desired activity, 
said instructions comprising: 

(a) code for receiving data characterizing a training set of a protein variant library, 
wherein the data provides comprises activity and a nucleotide sequence ioformoti ea for 
each protein variant in the training set; 

(b) code for developing a computational algorithmic sequence activity model from 
the data, which sequence activity model predicts activity as a function of nucleotide types 
and corresponding position in the nucleotide sequence; 
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(c) code for using the sequence activity model to rank positions in a reference 
nucleotide sequence and/or nucleotide types at specific positions in the reference 
nucleotide sequence in order of impact on the desired activity; 

(d) code for generating a ranked list of the nucleotide positions and/or the nucleotide 
types at specific positions in the reference nucleotide sequence: and 

£ei code for using the ranking to identify one or more nucleotides, in the reference 
nucleotide sequence, that are to be varied or fixed in order to impact the desired activity. 

80. (Previously presented) The computer program product of claim 79, wherein the 
nucleotides to be varied are codons encoding particular amino acids. 

81. (Previously presented) The computer program product of claim 79, wherein the 
activity is a function of expression of nucleic acids, 

82. -100. (Cancelled) 

101. (New) The method of claim 76, wherein (e) comprises generating a new protein 
variant library wherein the sequences of the members of the new protein variant library comprise 
amino acid residues encoded by the identified nucleotides varied or fixed in order to impact the 
desired activity, 

1 02. (New) The method of claim 101, wherein (e) comprises expressing the new 
protein variant library from polynucleotides encoding members of the new protein variant library 
and wherein the polynucleotides are prepared by gene synthesis. 

103. (New) The method of claim 101, wherein (e) comprises expressing the new 
protein variant library from polynucleotides encoding members of the new protein variant library 
and wherein the polynucleotides are prepared by mutagenesis. 

104. (New) The method of claim 101, wherein (e) comprises expressing the new 
protein variant library from polynucleotides encoding members of the new protein variant library 
and wherein the polynucleotides are prepared by performing a recombination-based diversity 
generation mechanism. 

105. (New) The method of claim 101, further comprising screening the new protein 
variant library to identify protein variants having the desired activity. 
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106. (New) The method of claim 105, further comprising sequencing the identified 
protein variants having the desired activity. 

1 07. (New) The method of claim 106, further comprising repeating (a) - (c) using the 
activity and sequence data from protein variants in the new protein variant library. 

108. (New) The method of claim 101, wherein the members of the new protein variant 
library comprise the same amino acid sequence encoded by different nucleotide sequences. 

109. (New) A method for identifying nucleotides for variation in a nucleotide 
sequence in order to optimize the expression properties of the nucleotide sequence, said method 
comprising: 

(a) receiving data characterizing a training set comprising a nucleotide sequence and 
a corresponding quantity of protein expressed for each different nucleotide sequence in the 
training set; 

(b) from the data, developing a computational algorithmic model that predicts a 
quantity of protein expressed as a function of nucleotide types and corresponding position in the 
nucleotide sequence; 

(c) using the model to rank positions in a reference nucleotide sequence and/or 
nucleotide types at specific positions in the reference nucleotide sequence in order of impact on 
the quantity of protein expressed; 

(d) using the ranking to identify one or more nucleotides, in the reference nucleotide 
sequence, that are to be varied or fixed in order to impact the quantity of protein expressed; and 

(e) expressing protein from a modified version of the reference nucleotide sequence 
in which the identified nucleotides are varied or fixed in order to impact the quantity of protein 
expressed. 

110* (New) The method of claim 109, wherein some of the nucleotides to be varied 
comprise codons encoding particular amino acids. 

111. (New) The method of claim 1 09, wherein (e) comprises expressing protein from a 
plurality of polynucleotides corresponding to modified versions of the reference nucleotide 
sequence in which the identified nucleotides are varied or fixed in order to impact the quantity of 
protein expressed. 

112. (New) The method of claim 111, wherein in (e) the polynucleotides are prepared 
by gene synthesis. 
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113. (New) The method of claim 111, wherein in (e) the polynucleotides are prepared 
by mutagenesis. 

1 14. (New) The method of claim 111, wherein in (e) the polynucleotides are prepared 
by performing a recombination-based diversity generation mechanism. 

115. (New) The method of claim 111, further comprising determining whether the 
modified versions of the reference nucleotide sequence impact the quantity of protein expressed. 

1 1 6 . (N ew) The method of claim 115, further comprising repeating (a) - (c) using the 
quantity of protein expressed and sequence data corresponding to the modified version of the 
reference nucleotide sequence. 

117. (New) The method of claim 111, wherein the modified versions of the reference 
nucleotide sequence encode the same amino acid sequence. 

118. (New) A computer program product comprising a machine readable medium on 
which is provided program instructions for identifying nucleotides for variation in a nucleotide 
sequence in order to optimize the expression properties of the nucleotide sequence, said 
instructions comprising: 

(a) code for receiving data characterizing a training set comprising a nucleotide 
sequence and a corresponding quantity of protein expressed for each different nucleotide 
sequence in the training set; 

(b) code for developing a computational algorithmic model from the data, which 
model predicts a quantity of protein expressed as a function of nucleotide types and 
corresponding position in the nucleotide sequence; 

(c) code for using the sequence activity model to rank positions in a reference 
nucleotide sequence and/or nucleotide types at specific positions in the reference 
nucleotide sequence in order of impact on the quantity of protein expressed; 

(d) code for generating a ranked list of the nucleotide positions and/or the nucleotide 
types at specific positions in the reference nucleotide sequence; and 

(e) code for using the ranking to identify one or more nucleotides, in the reference 
nucleotide sequence, that are to be varied or fixed in order to impact the quantity of 
protein expressed. 

119. (New) The computer program product of claim 118, wherein some of the 
nucleotides to be varied are codons encoding particular amino acids. 
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